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Abstract. By replacing linear averaging in Shannon entropy with Kolmogorov- 
Nagumo average (KN-averages) or quasilinear mean and further imposing the 
additivity constraint, Renyi proposed the first formal generalization of Shannon 
entropy. Using this recipe of Renyi, one can prepare only two information measures: 
Shannon and Renyi entropy. Indeed, using this formalism Renyi characterized 
these additive entropies in terms of axioms of quasilinear mean. As additivity is a 
characteristic property of Shannon entropy, pseudo-additivity of the form x ®q y = 
x + 2/ + (1 — q)xy is a characteristic property of nonextensive (or Tsallis) entropy. One 
can apply Renyi's recipe in the nonextensive case by replacing the linear averaging 
in Tsallis entropy with KN-averages and thereby imposing the constraint of pseudo- 
additivity. In this paper we show that nonextensive entropy is unique under the Renyi's 
recipe, and there by give a characterization. 
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1. Introduction 

In recent years, interest in generalized information measures has increased dramatically, 
after the introduction of nonextensive entropy in Physics in 1988 by Tsallis P]. One 
can get this nonextensive entropy or Tsallis entropy by generalizing the information of 
single event in the definition of Shannon entropy, by replacing logarithm with so called 
g-logarithm, which is defined as In^x = ^^-jz—^- Tsallis entropy does not satisfy the 
additivity property which is a characteristic property of Shannon entropy. Instead, it 
satisfies pseudo-additivity of the form x (Bq y = x + y + {1 — q)xy and this definition of 
entropy (also known as nonextensive entropy) led to the field of nonextensive statistical 
mechanics in Physics. In this paper we use the term pseudo-addition to represent the 
binary operation xQ)gy = x + y+{l — q)xy for any g e M and g > 0. 

Tsallis entropy is considered as a useful measure in describing the thermostatistical 
properties of a certain class of physical systems that entail long-range interactions, 
long-term memories and multi-fractal structures. Tsallis entropy is also studied in 
information theory and Shannon-Khinchin axioms have been generalized to nonextensive 
case. While canonical distributions resulting from maximization of Shannon entropy are 
exponential in nature, in the Tsallis case, these result in power-law distributions. To 
a great extent, the success of Tsallis proposal is due to the ubiquity of power law 
distributions in nature. 

Indeed, the starting point of the theory of generalized measures of information is 
due to Alfred Renyi [3 0] • By using Kolmogorov-Nagumo averages (KN-average) Renyi 
introduced a generalized information measure, known as a-entropy or Renyi entropy, the 
first formal well-known generalization of Shannon entropy. KN-average or quasilinear 
mean (we use these two terms interchangeably) is of the form (x)^ = tf)'^ C^kPki'i^k)), 
where i/j is an arbitrary continuous and strictly monotone function. Replacing linear 
averaging in Shannon entropy with KN-averages and further imposing the additivity 
constraint - a characteristic property of underlying information associated with single 
event, which is logarithmic - leads to Renyi entropy. Using this recipe of Renyi, 
one can prepare only two information measures: Shannon and Renyi entropy. Using 
this formalism Renyi characterized these additive entropies in terms of axioms of KN- 
averages. 

One can apply Renyi's recipe in the nonextensive case by replacing the linear 
averaging in Tsallis entropy with KN-averages and thereby imposing the constraint 
of pseudo-additivity. A natural question arises: what are all the pseudo-additive 
information measures one can prepare with this recipe? We prove that only Tsallis 
entropy is possible in this case, which allows us to characterize Tsallis entropy based on 
axioms of KN-averages. 

To understand these generalizations, the so called Hartley function jlj of a single 
stochastic event plays a fundamental role. We discuss Hartley function in §|2] along with 
a brief discussion on quasilinear mean and Renyi entropy. The main results of this paper, 
on uniqueness of Tsallis entropy under Renyi's recipe and a result on characterization 
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of Tsallis entropy are presented in § 0] and § HI respectively. 

2. KN-averages and Information measures 
2.1. Hartley Function and Shannon Entropy 

Let X be a discrete random variable (r.v) defined on some probability space, which 
takes only n values, n < oo. We denote the set of all such random variables by X. 
Corresponding to the n-tuple (xi, . . . of values which X takes, probability mass 
function (pmf) of X is denoted by p = (pi, . . -Pn), where p^ > ior k = 1, . . . n and 
J2k=iPk = 1- Expectation of r.v X is denoted by EX or {X); in this paper we use both 
the notations, interchangeably. 

Shannon entropy, a logarithmic measure of information on X denoted by S{X), 
reads p] 

n 

S{X) = -J^Pki'^Pk , (1) 

k=l 

and measures the average lack of information that is inherent in p. 

This motivation to quantify information in terms of logarithmic functions is due to 
Hartley who first used a logarithmic function to define uncertainty associated with 
a finite set. This is known as Hartley information measure. The Hartley information 
measure of a finite set A with n elements is defined as H{A) = log^ n. If the base of 
the logarithm is 2, then the uncertainty is measured in bits, and in the case of natural 
logarithm, the unit is nats. Throughout this paper we use only natural logarithm as a 
convention. 

One can give a more general definition of Hartley information measure, which is a 
special case of Shannon entropy as follows. Define a function H : {xi, . . . — ^ M of 
the values taken by r.v X G A" with corresponding p.m.f p = {pi, . . .pn) as jB] 

Hixk) =ln— , VA: = l,...n. (2) 

Pk 

H is also known as entropy of a single event and plays an important role in all classical 
measures of information. It can be interpreted either as a measure of how unexpected 
the event was, or as measure of the information yielded by the event. Hartley function 
satisfies: (i) H is nonnegative: H{xk) > (ii) H is additive: H{xiXj) = H{xi) + H{xj) 
(iii) H is normalized: H[xk) = 1, whenever Pk = ^ (in the case of logarithm with base 
2, the same satisfied for Pk = These properties are both necessary and sufficient [H]. 
Now, Shannon entropy can be written as expectation of Hartley function as 

n 

S{X) = {H) = Y,PkHk , (3) 

k=l 

where Hk = H{xk), Wk = 1, . . .n, with the understanding that {H) = {H{X)). 
The characteristic additive property of Shannon entropy 

Six xY) = Six) + SiY) , (4) 
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for two independent random variables X and Y now follows as a consequence of the 
additivity property of Hartley function. 

There are two postulates involved in defining Shannon entropy as expectation 
of Hartley function. One is the additivity of information which is the characteristic 
property of Hartley function, and the other is that if different amounts of information 
occur with different probabilities, the total information will be the average of the 
individual informations weighted by the probabilities of their occurrences. 

The basic idea behind Renyi's generalization is any putative candidate for an 
entropy should be a mean and there by use a well known idea in mathematics that 
the linear mean, though most widely used, is not the only possible way of averaging, 
however, one can define the mean with respect to an arbitrary function. Here we briefly 
discuss generalized averages and its properties which are essential for the results we 
present in this paper. 

2.2. Kolmogorov-Nagumo Averages or Quasilinear Mean 

In the general theory of means, quasilinear mean of a random variable X is defined as§ 

E^X={X)^ = ij-'(^^Pk^ixk)^ , (5) 

where -0 is continuous and strictly monotonic (increasing or decreasing) in which 
case it has an inverse which satisfies the same conditions. In the context of 
generalized means, ip is referred to as Kolmogorov-Nagumo function or KN-function. 
If, in particular, ip is linear, then (0) reduces to the expression of linear averaging. 

The following theorem qualifies quasilinear means. 

Theorem 2.1. Ifip is continuous and strictly monotone in a < x < b, a < Xk < b, k = 
1, . . .n, pk > and Y12=iPk = 1; ^^^n 3 unique xo G (a, b) such that 

n 

ip{xo) = ^^Pk^iXk) 
k=l 

and Xq is greater than some and less than others of the x^ unless all x^ are zero. 

§ Kolmogorov 7„ and Nagumo |H] first characterized the quasihnear mean {x)^ for a vector (xi, . . . , a;„) 
as (a;)^ = V'"^ (X]fc=i Ti'^i^k)) where "i/j is a continuous and strictly monotone function. De Finetti [3] 
extended their result to the case of simple (finite) probability distributions. The version of the 
quasilinear mean representation theorem referred to in §0]is due to Hardy, Littlewood and Polya |1(J| . 
which followed closely the approach of de Finetti. Aczel proved a characterization of the quasilinear 
mean using functional equations. Ben-Tal 12 showed that quasilinear means are ordinary arithmetic 
means under suitably defined addition and scalar multiplication operations. Norris 13, did a survey 
of quasilinear means and its more restrictive forms in Statistics. More recent survey of generalized 
means can be found in ^1]. Applications of quasilinear means can be found in economics (for example, 
[15!) ^^'^ decision theory (for example, jE]). Recently Czachor and Naudts ^2] studied generalized 
thermostatistics based on quasilinear means. 
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Thus, the mean ( . )^ is determined when the function ifj is given. We may ask 
whether the converse is true: if = (X)^^ for all X & X, is ipi necessarily the same 

function as ^2? First we give the following definition. 

Definition 2.2. Continuous and strictly monotone functions ipi and -02 o,re said to he 
KN-equivalent if (X)^^ = {X)^^ for all X e X . 

Note that when we compare two means, it is to be understood that the underlying 
probabilites are same. The following theorem characterizes KN-equivalent functions. 

Theorem 2.3. In order that two continuous and strictly monotone functions ifji and 
'i/j2 are KN-equivalent, it is necessary and sufficient that 

tpi = aij2 + P , 

where a and (3 are constants and a 7^ 0. 

Corollary 2.4. Let ip he a KN-function then {X)^ = {X)_^ . 

Hence, when ever required, without loss of generality, one can assume that is 
an increasing function. The following theorem characterizes additivity of quasilinear 
means. 

Theorem 2.5. Let ifj he a KN-function and c he a real constant then {X + c)^ = (X)^+c 
i.e., 

if and only if ip is either linear or exponential. 

Proof of Theorems 12. ![ 12.31 and 12.51 can be found in the book on inequalities by 
Hardy, Littlewood, Polya [TU] . 

2.3. Renyi Entropy 

In the definition of Shannon entropy (jH)), if the standard mean of Hartley function H 
is replaced with the quasilinear mean (0), one can obtain a generalized measure of 
information of r.v X with respect to a KN-function ip as 



(6) 



where ip is a KN-function. We refer to (jHl) as quasilinear entropy with respect to the 
KN-function i/j. If we impose the constraint of additivity on S^, then ip should satisfy j2] 

{X + c)^ = {X)^ + c, (7) 

for any random variable X E X and a constant c. 

Renyi employed this formalism to define a one-parameter family of measures of 
information (a-entropies) as follows: 



(8) 
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where the KN-function ifj is chosen in (0) as -(/^(x) = e^^"")^ whose choice is motivated 
by Theorem 12 .51 If we choose -0 as a hnear function in quasihnear entropy ©, what 
we get is Shannon entropy. Renyi entropy is a one-parameter generahzation of Shannon 
entropy in the sense that the hmit a — 1 in (jH)) retrieves Shannon entropy. 

Despite its formal origin Renyi entropy proved important in a variety of practical 
applications in coding theory 0, statistical inference [IHllin], quantum mechanics PU] , 
chaotic dynamics systems |^. Thermodynamic properties of systems with multi-fractal 
structures have been studied by extending the notion of Gibbs-Shannon entropy into a 
more general framework - Renyi entropy |22j . 



3. Renyi's Recipe and Tsallis Entropy 

3.1. Tsallis Entropy 

Due to an increasing interest in long-range correlated systems and non-equilibrium 
phenomena there has recently been much focus on the Tsallis (or nonextensive) entropy. 
Although, first introduced by Havrda and Charvat in the context of cybernetics 
theory and later studied by Daroczy |21], it was Tsallis who exploited its nonextensive 
features and placed it in a physical setting. Hence it is also known as Harvda- 
Charvat-Daroczy-Tsallis entropy. Throughout this paper we refer to this as Tsallis 
or nonextensive entropy. Tsallis entropy of a r.v X E X with p.m.f p = (pi, . . .pn) is 
defined as 

1 v^n q 

Sq{X) = , (9) 

where g > is called the nonextensive index. Tsallis entropy too, like Renyi entropy, 
is a one-parameter generalization of Shannon entropy in the sense that g — 1 in 
retrieves Shannon entropy. Tsallis entropy is concave for all > 0, but Renyi entropy 
is concave only for < a < 1. The index q characterizes the degree of nonextensivity 
reflected in the pseudo-additivity property 

S.iXxY) = S,{X)®qS,iY) = S,iX) + S,{Y) + {l-q)S,{X)S,{Y) , (10) 

where X,Y E X are two independent random variables. 



3.2. Nongeneralizability of Tsallis Entropy 

Though the derivation of Tsallis entropy, when it was proposed in 1988 P] is slightly 
different, one can understand this generalization using {/-logarithm function (see ()12|)). 
where one would first generalize logarithm in the Hartley information with g-logarithm 
and define (/-Hartley function H : {xi, . . . , x„,} ^ M of r.v X as [2^ 

Hk = H{xk) = liag— , k = l,...n. (11) 
Pk 

The g-logarithm in (fTT|) is defined as 

ln,(a;) = -^—^ , (12) 
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which satisfies pseudo-additivity of the form \nq{xy) = lugX (Bq Ing?/ and in the hmit 
g 1, we have IngX — > Inx. Now TsaUis entropy can be defined as the expectation 
of g-Hartley function H as 

S,{X) = {h) . (13) 

Note that the characteristic pseudo-additivity property of TsaUis entropy (ITUI) is a 
consequence of additivity property of Hartley function. 

Before we present the main resuhs of this paper, we briefiy discuss the context of 
quasihnear means where there is a relation between Tsallis and Renyi entropy. The 
g-Hartley function can be written as 



where 



Hk = lUg — = (l)q{Hk) , 

Pk 



Note that 0, is KN-equivalent to e^^~'^^^ (by Theorem 12. 3|) . the KN- function used in 
Renyi entropy. Hence Tsallis entropy is related to Renyi entropies as 

Sj = c/^qiSf^) , (15) 

where 5*^ and 5*^ denote the Tsallis and Renyi entropy respectively with a real number 
g as a parameter. Hence, Tsallis entropy and Renyi entropy are monotonic functions 
of each other and, as a result, both must be maximized by the same probability 
distribution. 

Now a natural question that arises is whether one could generalize Tsallis entropy 
using Renyi's recipe i.e., by replacing linear average in p3|) by KN-averages and impose 
the condition of pseudo-additivity. It is equivalent to determining the KN-function if; 
for which so called g-quasilinear entropy defined as 



S4X) = ^Hj^ = ij-' 



.k=l 



(16) 



where = H{xk) Wk = 1, . . .n, satisfies the pseudo-additive property. 

First, we present the following result which characterizes the pseudo-additivity of 
quasilinear means. 

Theorem 3.1. Let X,Y E X he two independent random variables. Let be any 
KN-function. Then 

{X®qY)^ = {X).^(Sq{Y)^ (17) 

if and only if ip is linear. 

Proof. Let p and r be the p.m.fs of random variables X, y G A" respectively. The proof 
of sufficiency is simple which follows from 

n n 

{X Y)^ = {X Y) = Y.Y.P'''^^^^ ®i ' 

i=i j=i 



and by the definition of Q)q, we liave 



9' 

n n 



{X ®q Y) = ^^PiTjix, + y.j + (1 - q)Xiyj) 
i=i j=i 

n n n n 

= Pi^i + Yl ^jyj + (1 - ^) XI P'^^ 5Z ^^y^ ■ 

1=1 j=l i=l j=l 

To prove tlie converse, we need to determine all forms of i/j which satisfy 

(n n \ \ \ 

5^ 5^p,r,V {x, ®, Vj) = ^"M J^Pi^ i^i) ©9^'' -(IS) 
i=i j=i / \i=i / \j=i / 

Since p8|l must hold for arbitrary p.m.fs p,r and for arbitrary numbers {xi, . . . , Xn} 
and {yi, . . . , one can choose yj = c independently of j. Then (fTH)) yields 

(YPki^ ®1 j = (^Pkyj (X,)^ ®q c ■ (19) 

That is, i/j should satisfy 

{X(B,c)^ = {X)^(B,c , (20) 
for any X E X and any constant c. This can be rearranged as 

((1 + (1 - q)c)X + c)^ = (1 + (1 - q)c){X)^ + c 

by using the definition of ®q- Since q is independent of other quantities, if) should satisfy 
an equation of the form 

{dX + c)^ = d{X)^ + c , (21) 

where d 7^ (by writing = (1 + (1 — g)c)). Finally if) must satisfy 

{X + c)^ = (X)^ + c (22) 

and 

(dX)^ = d{X)^ , (23) 

for any X E X and any constants c?, c. From Theorem 12 .51 the condition ()22p is satisfied 
only when ■0 is linear or exponential. 

To complete the theorem we have to show that KN-averages do not satisfy condition 
(|23|l when -0 is exponential. For a particular choice of il){x) = e*-^""^^, assume that 

{dX)^ = d{X)^ , (24) 

where 



and 



d 



n 



,fc=l 

Now define a KN-function ip' as tp^x) = e^^"")"^^, for wliicli 



Condition ()24|1 implies 

and by Tlieorem 12 .31 i/) and ip' are KN-equivalent wliicli gives a contradiction. 



□ 



One can observe tliat tlie above proof avoids solving functional equations as in the 
case of Tlieorem 12.51 (see (6]). Instead it makes use of basic results of KN-averages. The 
following corollary is the immediate consequence of Theorem 13.11 

Corollary 3.2. q-quasilinear entropy (defined as in / f7^) ) with respect to a KN- 
function ip satisfies pseudo-additivity if and only if is Tsallis entropy. 

Proof. Let X,Y G X he two independent random variables and let r be their 
corresponding pmfs. By the pseudo-additivity constraint, tp should satisfy 

S^{X xY) = S^{X) S^iY) (25) 

From the property of g-logarithm that In^ xy = In^ x (Bq In^ y, we need 

= (j^Pi'ip (in, ^) j i^-' (^r-jiP (in, ^ 
Equivalently, we need 

(n n 

where and H"^ represent the q'-Hartley functions corresponding to probability 
distributions p and r respectively. That is, ip should satisfy 

Also from Theorem 13. ![ ip is linear and hence is Tsallis. □ 



(26) 
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Corollary 13.21 shows that using the Renyi's recipe in the nonextensive case one can 
prepare only Tsallis entropy, while in the classical there are two possibilities. 

4. A Characterization Theorem for Tsallis Entropy 

The importance of Renyi's formalism to generalize Shannon entropy is a characterization 
of Shannon entropy in terms of axiom of quasilinear means By the result, 

Theorem 13.11 that we presented in this paper, one can give a characterization of Tsallis 
entropy in terms of axioms of quasilinear means. For such a characterization one would 
assume that entropy is the expectation of a function of underlying r.v. In the classical 
case, the function is Hartley function, while in the nonextensive case it is g-Hartlay 
function. 

Since characterization of quasilinear means is given in terms of cumulative 
distribution of a random variable, we use the following definitions and notation. 

Let F : M — s> M denote the cumulative distribution function of random variable 
X E X. Corresponding to a KN-function : M ^ M, generalized mean of F (or X) can 
be written as 



which is continuous analogue to (0) and it is axiomized by Kolmogorov, Nagumo and 
De Finetti (see ^3 Theorem 215]) as follows. 

Theorem 4.1. Let Ti he the set of all cumulative distribution functions defined on some 
interval I of the real line M. A functional k : jFj ^ M satisfies the following axioms: 

axiom 1: k{5x) = x, where 6^ G J^i denotes the step function at x (Consistency with 
certainty) , 

axiom 2: F,G E J-'j, if F < G then k{F) < k{G); the equality holds if and only if 
F = G (Monotonicity) and, 

axiom 3: F, G G J^i, if k{F) = k{G) then k{(3F + (1 - I3)H) = k{(3G + (1 - P)H), for 
any H E J-'i (Quasilinearity) 

if and only if there is a continuous strictly monotone function ip such that 



The modified axioms for quasilinear mean can be found in 1201 EZIE])- Now we give 
our characterization theorem for Tsallis entropy that is similar to the characterization 
of Shannon entropy given by Renyi j2] . 

Theorem 4.2. Let X E X he a random variable. An information measure defined as a 
(generalized) mean k of q- Hartley function of X is Tsallis entropy if and only if 

(i) K satisfies axioms of quasilinear means given in Theorem \4.1\ and, 




(27) 
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(a) If X,Y ^ X are two random variables which are independent, then 

Theorem 14. 21 is a direct consequence of Theorems 13. II and 14. II This characterization 
of Tsalhs entropy only replaces the additivity constraint in the characterization of 
Shannon entropy given by Renyi in j2], with pseudo-additivity, which further does not 
make use of the postulate k{H) + k{—H) = 0. (This postulate is needed to distinguish 
Shannon entropy from Renyi entropy). This is possible because Tsallis entropy is unique 
by means of KN-averages and under pseudo-additivity. 

5. Conclusions 

Passing an information measure through Renyi formalism - procedure followed by Renyi 
to generalize Shannon entropy - allows one to study the possible generalizations and 
characterize information measure in the context in terms of axioms of quasilinear means. 
In this paper we studied this technique for nonextensive entropy and showed that Tsallis 
entropy is unique under Renyi's recipe. Considering the attempts to study generalized 
thermostatistics based on KN-averages (for example [Ej), the results presented in this 
paper further the relation between entropic measures and generalized averages. 
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